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Computer simulations have become an important tool across the biomedical 
sciences and beyond. For many important problems several different models or 
hypotheses exist and choosing which one best describes reality or observed data is 
not straightforward. We therefore require suitable statistical tools that allow us to 
choose rationally between different mechanistic models of e.g. signal transduction 
or gene regulation networks. This is particularly challenging in systems biology 
where only a small number of molecular species can be assayed at any given time 
and all measurements are subject to measurement uncertainty. Here we develop 
such a model selection framework based on approximate Bayesian computation and 
employing sequential Monte Carlo sampling. We show that our approach can be 
applied across a wide range of biological scenarios, and we illustrate its use on 
real data describing influenza dynamics and the JAK-STAT signalling pathway. 
Bayesian model selection strikes a balance between the complexity of the simulation 
models and their ability to describe observed data. The present approach enables 
us to employ the whole formal apparatus to any system that can be (efficiently) 
simulated, even when exact likelihoods are computationally intractable. 



1 Introduction 

Mathematical models are widely used to describe and analyze complex systems and 
processes. Formulating a model to describe, e.g. a signalling pathway or host par- 
asite system, requires us to condense our assumptions and knowledge into a single 
coherent framework [1]. Mathematical analysis and computer simulations of such 
models then allow us to compare model predictions with experimental observations 
in order to test, and ultimately improve these models. The continuing success, for 
example of systems biology, relies on the judicious combination of experimental and 
theoretical lines of argument. 



Because many of the mathematical models in biology (as in many other disciplines) 
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are too complicated to be analyzed in a closed form, computer simulations have 
become the primary tool in the quantitative analysis of very large or complex bio- 
logical systems. This, however, can complicate comparisons of different candidate 
models in light of (frequently sparse and noisy) observed data. Whenever proba- 
bilistic models exist, we can employ standard model selection approaches of either 
a frequentist, Bayesian, or information theoretic nature [2,3]. But if suitable prob- 
ability models do not exist, or if the evaluation of the likelihood is computationally 
intractable, then we have to base our assessment on the level of agreement between 
simulated and observed data. This is particularly challenging when the parameters 
of simulation models are not known but must be inferred from observed data as 
well. Bayesian model selection side-steps or overcomes this problem by marginal- 
izing (that is integrating) over model parameters, thereby effectively treating all 
model parameters as nuisance parameters. 

For the case of parameter estimation when likelihoods are intractable, approximate 
Bayesian computation (ABC) frameworks have been applied successfully [4-9]. In 
ABC the calculation of the likelihood is replaced by a comparison between the ob- 
served data and simulated data. Given the prior distribution P(9) of parameter 
9, the goal is to approximate the posterior distribution, P(9\Dq) oc f(Do\9)P(9), 
where f(Do\6) is the likelihood of 9 given the data Dq. ABC methods have the 
following generic form: 

1 Sample a candidate parameter vector 9* from prior distribution P{9). 

2 Simulate a data set D* from the model described by a conditional probability 
distribution f(D\9*). 

3 Compare the simulated data set, D* , to the experimental data, Do, using 
a distance function, d, and tolerance e; if d(Do,D*) < e, accept 8*. The 
tolerance e > is the desired level of agreement between Dq and D*. 

The output of an ABC algorithm is a sample of parameters from the distribution 
P(9\d(Do, D*) < e). If e is sufficiently small then this distribution will be a good 
approximation for the "true" posterior distribution, P(9\Dq). A tutorial on ABC 
methods is available in the Suppl. Material. 

Such a parameter estimation approach can be used whenever the model is known. 
However, when several plausible candidate models are available we have a model 
selection problem, where both the model structure and parameters are unknown. In 
the Bayesian framework, model selection is closely related to parameter estimation, 
but the focus shifts onto the marginal posterior probability of model m given data 
Do, 

P{mm ~ PiDo) 

where P(Do\m) is the marginal likelihood and P(m) the prior probability of the 
model [10]. This framework has some conceptual advantages over classical hy- 
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pothesis testing: for example, we can rank an arbitrary number of different non- 
nested models by their marginal probabilities; and rather than only considering ev- 
idence against a model the Bayesian framework also weights evidence in a model's 
favour [11]. In practical applications, however, a range of potential pitfalls need con- 
sidering: model probabilities can show strong dependence on model and parameter 
priors; and the computational effort needed to evaluate these posterior distributions 
can make these approaches cumbersome. 

The computationally expensive step in Bayesian model selection is the evaluation of 
the marginal likelihood, which is obtained by marginalizing over model parameters; 
i.e. P(Do\m) = J f(Do\m,9)P(9\m)d9, where P(9\m) is the parameter prior for 
model m. Here we develop a computationally efficient ABC model selection formal- 
ism based on a sequential Monte Carlo (SMC) sampler. We show that our ABC 
SMC procedure allows us to employ the whole paraphernalia of the Bayesian model 
selection formalism, and we illustrate the use and scope of our new approach in a 
range of models: chemical reaction dynamics, Gibbs random fields, and real data 
describing influenza spread and JAK-STAT signal transduction. 



2 ABC for model selection 

Our goal is to estimate the marginal posterior distribution of a model, P(m\Do), 
and in this section we explain two ways in which this problem can be approached. 
In the joint space based approach we define a joint space of model indicators, m = 
1, 2, . . . , and corresponding model parameters, 9, obtain the joint posterior 
distribution over the combined space of models and parameters, P(9, m\Do), and 
finally marginalize over parameters to obtain P(m\Do). In the second, marginal 
likelihood based approach, we estimate marginal likelihoods (also called the evidence), 
P(Do\m), for each given model, and use these to calculate the marginal posterior 
model distributions through 

Both approaches have been applied under the ABC rejection scheme, which is com- 
putationally prohibitive for models with even an only moderate number of parame- 
ters [12,13]. Here we incorporate ideas from SMC to both of the above approaches, 
making them computationally more efficient. In this section we present only the 
more powerful approach ABC SMC model selection on the joint space. We refer the 
reader to the Suppl. Material for derivations and details, as well as discussion on 
the ABC SMC model selection algorithm based on the marginal likelihood approach. 

In model selection based on ABC rejection we adapt the basic ABC procedure 
(presented in the introduction) to the joint space, where particles (m, 9) consist of a 
model indicator m and a parameter 9. The ABC rejection model selection algorithm 
on the joint space proceeds as follows [13]: 
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1 Draw m* from the prior P(m). 



2 Sample 9* from the prior P(9\m*). 

3 Simulate a candidate data set D* ~ f(D\9*,m*). 

4 Compute the distance. If d(Do,D*) < e, accept (m*,9*), otherwise reject it. 

5 Return to 1. 

Once a sample of N particles has been accepted, the marginal posterior distribution 
is approximated by 

. ^accepted particlesfm', .) 
P(m = m \D ) ~ — • 

In the ABC SMC model selection algorithm on the joint space, particles (parame- 
ter vectors) {(mi,#i), . . . , (tun^n)} are sampled from the prior distribution, P(m, 9), 
and propagated through a sequence of intermediate distributions, P(m, 9\d(Do, D*) < 
€i), i = 1, ...,T — 1, until they represent a sample from the target distribution, 
P(m, 9\d(Do, D*) < ex)- The tolerances are chosen such that e± > . . . > > 0, 
and the distributions thus gradually evolve towards the target posterior distribution. 



The algorithm is presented below (and explained in the Suppl. Tutorial). 

ABC SMC model selection algorithm on the joint space 

MSI Initialize e\, . . . , ex- 

Set the population indicator t = 1. 

MS2.0 Set the particle indicator i = 1. 

MS2.1 If t = 1, sample (m**,9**) from the prior distribution P(m,9). 

If t > 1, sample m* with probability P t -\{rn*) and draw m** ~ KM t {m\m*). 
Sample 9* from previous population {9(m**) t -i} with weights Wt-\ and draw 
9** ~ KP t , m **(0\0*). 
If P(m**,9**) = 0, return to MS2.1. 

Simulate a candidate data set ~ f(D\m** ,9**) B t times (b = l,...,B t ) 

and calculate bt(m** ,9**). 

If b t (m**,9**) = 0, return to MS2.1. 

MS2.2 Set (m t (i) ,6> t W ) = (m**,fl**) and calculate the weight of the particle as 

f , , (i) M. 

w t ( m t >0t ) — 



5 U t 



P(m« #)M<V ; 



(0 fl(*)> 



it t = l 
if t > 1. 
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where 



b t {r4\e?) 




6=1 



\M\ 



s 



3=1 



(i) P t _i(m t _i = mf } ) 



If z < AT set z = i + 1, go to MS2.1. 



MS3 Normalize the weights wt- 

Sum the particle weights to obtain marginal model probabilities, 



P t (m t = m)= w ? ("4° , e ? ) • 




If t < T, set t = t + 1, go to MS2.0. 

Particles sampled from a previous distribution are denoted by a single asterisk, and 
after perturbation by a double asterisk. KM is a model perturbation kernel which 
allows us to obtain model m from model m* and KP is the parameter perturba- 
tion kernel. Bt > 1 is the number of replicate simulation run for a fixed particle 
(for deterministic models B t = 1) and \M\ denotes the number of candidate models. 

The output of the algorithm, i.e. the set of particles {(mr,^)} associated with 
weights wt, is the approximation of the full posterior distribution on the joint model 
and parameter space. The approximation of the marginal posterior distribution of 
the model obtained by marginalization is 



and we can also straightforwardly obtain the marginalized parameter distributions. 

The algorithm requires the user to define the prior distribution, distance function, 
tolerance schedule and perturbation kernels. In all examples presented in the results 
section we choose uniform prior distributions for all parameters and models; that is 
all models are a priori equally plausible. Such priors are informative in a sense that 
they define a feasible parameter region (e.g. reaction rates are positive), but they 
are predominantly non-informative as they do not specify any further preference for 
particular parameter values. This way the inference will mostly be informed by the 
information contained in the data. A good tolerance can be found empirically by 




w t (m T ,V T ), 



5 



trying to reach the lowest distance feasible and arrive at the posterior distribution 
in a computationally efficient way. Our perturbation kernels are component- wise 
truncated uniform or Gaussian and are automatically adapted by feeding back in- 
formation on the obtained parameter ranges from the previous population. Distance 
functions are defined for each model as specified in the results section. The algo- 
rithm presented in Toni et al. [8] is a special case of the above algorithm for discrete 
uniform KM kernel and uniform prior distribution of the model P{m). 

3 Results 

In this section we illustrate ABC SMC for model selection on a simple example of 
stochastic reaction kinetics. We then compare the computational efficiency of ABC 
SMC for stochastic models of Gibbs random fields with that of the ABC rejection 
model selection method. Finally, we apply the algorithm to several real datasets: 
first we select between different stochastic models of influenza epidemics (where we 
can compare our approach with previously published results obtained using exact 
Bayesian model selection), and then apply our approach to choose from among 
different mechanistic models for the STAT5 signaling pathway. 

3.1 Chemical Reaction Kinetics 

fci 

We illustrate our algorithm for the stochastic reaction kinetic models X + Y — > 2Y 

and X — > Y. The first is a model of an autocatalytic reaction, where the reaction 
product Y is the catalyst for the reaction. In the second, molecules Y do not need to 
be present for a change from X to Y to occur. Such models have, for example, been 
considered in the context of prion replication dynamics [14,15], where X represents 
a healthy form of a prion protein and Y a diseased form. 

We simulate synthetic datasets of Y measured at 20 time points using Gillespie 
algorithm [16] from model 2 with parameter &2 = 30 and initial conditions Xq = 40, 
Yq = 3 (Figure 1(a), Suppl. Table 1). We apply our ABC SMC algorithm for model 
selection, which identifies the correct model with high confidence (Figure 1(b)). 

3.2 Gibbs random fields 

Gibbs random fields have become staple models in machine learning, including appli- 
cations in computational biology and bioinformatics (see for example [13,17]). Here 
we use two Gibbs random field models [18] , for which closed form posterior distribu- 
tions are available. This allows us to compare the ABC SMC approximated posterior 
distributions of the models to true posterior distribtuions, and to demonstrate the 
computational efficiency of our approach when compared to model selection based 
on ABC rejection sampling. 

Both models, tuq and mi, are defined on a sequence of n binary random variables, 
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Figure 1: (a) Stochastic trajectories of species X (red) and Y (blue). Model 1 is simulated 
for k\ — 2.1 (dashed line), model 2 for k^ — 30 (full line). Data points are represented by 
circles, (b) We have repeated the model selection run 20 times; the red sections present 
25% and 75% quantiles around the median. Prior distribution P(m) is chosen uniform and 
k-y, &2 ~ C^(0, 100). Perturbation kernels are chosen as follows: KP t (k\k*) = k* + U(—a, a), 
a = 2(max{fc} t _i — min{fc} t _i) and KM t {m\m*) = 0.7 if m = m* and 0.3 otherwise. 
Number of particles N — 1000. B t = 1. Distance function is mean squared error and 
tolerance schedule e = {3000, 1400,600, 140,40}. 



x = (xi, . . . , x n ), Xi G {0, 1}; tuq is a collection of n iid Bernoulli random variables 
with probability 0q/(1 + exp(#o)); mi is equivalent to a standard Ising model, i.e. 
x\ is taken to be a binary random variable and P(xi + i = Xi\x{) = #i/(l + exp(#i)) 
for i = 2, . . . , x n . The likelihood functions are 

e e S (x) e 6iSi(x) 

fo(x|^n) = T a~^— and f\{x\8i) = — 5— — -. 

JKJ \ 1 u ' n _|_ e e \n 1 L > 2(l + e ei ) n_1 

where So(x) = X^r=i M x i = 1) an d S\(x) = Y^l=2 = are sufficient statis- 

tics, respectively. 

We simulate 1000 datasets from both models for different values of parameters 
6 ~ £7(-5,5), e x ~ 17(0, 6) and n = 100. Using ABC SMC for model selec- 
tion allows us to estimate posterior model distributions correctly and demonstrate 
a considerable computational speed-up in ABC SMC compared to ABC rejection 
(Figure 2). 



3.3 Infuenza infection outbreaks 

We next apply ABC SMC for model selection to models of the spread of different 
strains of the influenza virus. We use data from influenza A (H3N2) outbreaks that 
occurred in 1977-78 and 1980-81 in Tecomseh, Michigan [19] (Suppl. Table 2), and 
a second dataset of an influenza B infection outbreak in 1975-76 and influenza A 
(H1N1) infection outbreak in 1978-79 in Seattle, Washington [20] (Suppl. Table 3). 
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Figure 2: (a) True vs. inferred posterior model distribution. In ABC SMC we use the 
Euclidian distance d(D , x) = ^(S (D ) - S (x)) 2 + (S^Dq) - S^x)) 2 . N = 500. B t = 1. 
Tolerance schedule: e = {9,4,3,2,1,0}. Perturbation kernels: KM t (m\m*) = 0.75 if m = 
m* and 0.25 otherwise; KP t {6\6*) = 9* + U(-a,a), a = 0.5(max{6»} t _i - min{0} 4 _i). We 
have excluded those datasets for which all states are in or 1 (for which P(m = 0) rj 0.3094 is 
also correctly inferred) from the analysis, (b) Comparison of the number of simulation steps 
needed by ABC rejection (nuej) and ABC SMC («smc); ABC SMC yields an approximately 
50-fold speed-up on average. 



The basic questions to be addressed here are whether (i) different outbreaks of the 
same strain and (ii) outbreaks of different molecular strains of the influenza virus 
can be described by the same model of disease spread. 

We assume that virus can spread from infected to susceptible individuals and distin- 
guish between spread inside households or across the population at large [20]. Let 
q c denote the probability that a susceptible individual does not get infected from 
the community and the probability that a susceptible individual escapes infection 
within their household. Then Wj s , the probability that j out of the s susceptibles 
in a household become infected, is given by 

fynM h y- j , (1) 

where wq s = q s c , s = 0,1,2,..., and Wjj = 1 — YliZo w ij- We are interested in 
inferring the pair of parameters q^ and q c of the model (1) using the data from 
Suppl. Table 2. These data were obtained from two separate outbreaks of the 
same strain, H3N2, and the question of interest is whether these are characterized 
by the same epidemiological parameters (this question was previously considered 
in [21,22]). To investigate this issue, we consider two models: one with four pa- 
rameters, qhi, q&i Qh2, Qc2, which describes the hypothesis that each outbreak has 
its own characteristics; the second models the hypothesis that both outbreaks share 
the same epidemiological parameter values for q^ and q c - Prior distributions of all 
parameters are chosen to be uniform over the range [0, 1]. 
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To apply ABC SMC, we use a distance function 

d(D ,D*) = ^(\\D l -D*(q hl ,q cl )\\ F + \\D 2 -D*(q h2 ,q c2 )\\ F ), 

where \ \ \\f denotes the Frobenious norm, Do = D\ U D 2 with D\ the 1977-78 out- 
break and D 2 the 1980-81 outbreak datasets from Suppl. Table 2, and D* is the 
simulation output from model (1). The results we obtain are sumarized in Figure 
3(a) - 3(b) and strongly suggest that the two outbreaks appear to have shared the 
same epidemiological characteristics. Figure 3(a) shows the posterior distribution of 
the four-parameter model. The marginal posterior distributions of qhi and q& are 
largely overlapping with the marginal posterior distributions of q^ 2 and q c2 and we 
therefore, unsurprisingly, get strong evidence in favour of the two-parameter model. 
Figure 3(b) shows the marginal posterior distribution of the model; the posterior 
probability of model 1 is 0.98 (median over 10 runs), which gives unambiguous sup- 
port to model 1, meaning that outbreaks of the same strain share the same dynamics. 

Outbreaks due to a different viral strain (Suppl. Table 3) have different characteris- 
tics as indicated by the posterior distribution of the four-parameter model presented 
in Figure 3(c). This was confirmed by applying our model selection algorithm; the 
inferred posterior marginal model probability of a two-parameter model was negli- 
gible (results not shown). From Figure 3(c) we also see that these differences are 
due to differences in viral spread across the community whereas within-household 
dynamics are comparable. We thus explore a further model with three parameters, 
Qd, Qc2, Qh (model 1), where the two outbreaks share the same within- household 
characteristics (qh), and compare it against and the four-parameter model (model 
2). The obtained Bayes factor suggests that there is only very week evidence in 
favour of model 1 (Figure 3(d)), which is in agreement with the result of [21]. 

In general genetic predisposition, differences in immunity and lifestyle etc. will 
lead to heterogeneity in susceptibility to viral infection among the host population. 
Such a model can be written as [22] 

^(,)=Wi'(i-,rv.. (2) 

i=0 ^ ' 

On the basis of the previous results, we combine both outbreak data sets from Suppl. 
Table 2, and find some evidence that model (2) explains the data better than model 
(1), suggesting that the host-virus dynamics are shaped by the molecular nature of 
the viral strain, as well as by variability in the host population (see Suppl. Figure 
2). 

3.4 JAK-STAT signaling pathway 

Having convinced ourselves that the novel ABC SMC model selection approach 
agrees with the analytical model probabilities, and those obtained using conven- 
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Figure 3: (a) ABC SMC posterior distributions for parameters inferred for a four-parameter 
model from the data in Suppl. Table 2. Marginal posterior distributions of parameters q c i, 
qhi (red) and q C 2, qh2 (blue), (b) Estimation of a posterior marginal distribution P{m\D$). 
Model 1 is a two-parameter and model 2 a four-parameter model (1). All intermediate 
populations are shown in Suppl. Figure 1(a). (c) The same as (a) but here the data used 
is from Suppl. Table 3. (d) Estimation of a posterior marginal distribution. Model 1 is a 
two-parameter and model 2 a three-parameter model (1). All intermediate populations are 
shown in Suppl. Figure 1(b). 
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tional Bayesian model selection, while outperforming conventional ABC rejection 
model selection approaches, we can now turn our attention to real world scenarios 
that have not previously been considered from a Bayesian (exact or approximate) 
perspective. Here we consider models of signaling though the erythropoietin recep- 
tor (EpoR), transduced by STAT5 (Figure 4(a)) [23,24]. Signaling through this 
receptor is crucial for proliferation, differentiation, and survival of erythroid progen- 
itor cells [25]. When the Epo hormone binds to the EpoR receptor, the receptor's 
cytoplasmic domain is phosporylated, which creates a docking site for signaling 
molecules, in particular STAT5. Upon binding to the activated receptor, STAT5 
first becomes phosphorylated, then dimerizes and translocates to the nucleus, where 
it acts as a transcription factor. There have been competing hypotheses about what 
happens with the STAT5 in the nucleus. Originally it had been suggested that 
STAT5 gets degraded in the nucleus in an ubiquitin-asssociated way [26] , but other 
evidence suggests that they are dephosphorylated in the nucleus and then trafficked 
back to the cytoplasm [27]. 

The ambiguity of the shutoff mechanism of STAT5 in the nucleus triggered the 
development of several mathematical models [29,30,32] describing different hypothe- 
ses. All models assume mass action kinetics and denote the amount of activated 
Epo-receptors by EpoR a, monomeric unphosphorylated and phosporylated STAT5 
molecules by x\ and X2, respectively, dimeric phosphorylated STAT5 in the cyto- 
plasm by X3 and dimeric phosphorylated STAT5 in the nucleus by X4. The most 
basic model Timmer et al. developed, under the assumption that phosphorylated 
STAT5 does not leave the nucleus, consists of the following kinetic equations, 

x\ = —k\XiEpoRA (3) 
X2 = —k2x\ + k\X\EpoRA 

£4 = k 3 x 3 . (4) 

One can then assume that phosphorylated STAT5 dimers dissociate and leave the 
nucleus; this is modelled by adding appropriate kinetic terms to the equations (3) 
and (4) of the basic model to obtain 

xi = —k\X\EpoRA + 2/C4X4 
X4 = k 3 x 3 — /C4X4. 

The cycling model can be developed further by assuming a delay before STAT5 
leaves the nucleus: 

xi = —kiXiEpoRA + 2k^x 3 {t — r) 

±4 = k 3 x 3 - k4X 3 (t - t). (5) 

This model was chosen as the best model in the original analyses [29,30] based on a 
numerical evaluation of the likelihood, followed by a likelihood ratio test and boot- 
strap procedure for model selection. The data are partially observed time course 
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Fi gure 4: (a) STAT5 signaling pathway. Adapted from [28]. (b) Histograms show pop- 
ulations of the model parameter m. Population 20 represents the approximation of the 
marginal posterior distribution of m. Tolerance schedule: e = {200, 100, 50, 35, 30, 25, 22, 
20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8}. Perturbation kernels: KM t (m\m*) = 0.6 if 
m = m* and 0.2 otherwise; KP t {9\9*) = 9* + U(-a,a), a = 0.5(max{6>} t _i - min{6»} t _i). 

N = 500. Distance function: d(D ,D*) = ^ ( ^g^^ 

with D{) = {y i Q\y^} 1 D* = {y*^\y*^} and y^ the total amount of phosphoryalated 
STAT5 in the cytoplasm and ?/ 2 ' the total amount of STAT5 in the cytoplasm, and 

(2) n 

cr D are the associated confidence intervals; reassuringly, other distance functions, e.g. the 
square root of the sum of squared errors yield identical model selection results (data not 
shown) . 
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measurements of the total amount of STAT5 in the cytoplasm, and the amount of 
phosphorylated STAT5 in the cytoplasm; both are only known up to a normalizing 
factor. 

We propose a further model with clear physical interpretation where the delay acts 
on STAT5 inside the nucleus (X4) rather than on X3 (in equation (5)), for which a 
biological interpretation is difficult. Instead of xs(t — r), we propose to model the 
delay of phosphorylated STAT5 X4 in the nucleus directly and obtain [31]: 

x\ = — kix±EpoRA + 2k4Xi(t — r) 
X4 = ^3X3 — k^x^t — t). 

We perform the ABC SMC model selection algorithm on the following non-nested 
models: (1) Cycling delay model with xs(t — r), (2) Cycling delay model with 
X4(t — r), (3) Cycling model without a delay. The model parameter m can therefore 
take values 1, 2 and 3. 

For each proposed model and parameter combination we numerically solve the ODE 
equations of the model and add e ~ iV(0, a) to obtain the simulated time course 
data. The noise parameter a can be either fixed or treated as another parameter to 
be estimated; we consider the latter option, under the assumption that the experi- 
mental noise is independent and identically distributed for all time points. 

Figure 4(b) shows intermediate populations leading to the ABC SMC marginal 
posterior distribution over the model parameters m (population 20). Bayes factors 
can be calculated from the last population and according to the conventional in- 
terpretation of Bayes factors [33], it can be concluded that there is strong evidence 
in favour of model 3 compared to model 1, positive evidence in favour of model 3 
compared to model 2, and positive evidence in favour of model 2 compared to model 
1. Thus cycling appears to be clearly important and the model that receives the 
most support is the cycling model without a time-delay. Here the flexibility of ABC 
SMC has allowed us to perform simultaneous model selection on non-nested models 
of ordinary and time-delay differential equations. 

4 Discussion 

We have developed a novel model selection methodology based on approximate 
Bayesian computation and sequential Monte Carlo. The results obtained in our ap- 
plications illustrate the usefulness and wide applicability of our ABC SMC method, 
even when experimental data are scarce, there are no measurements for some of 
the species, temporal data are not measured at equidistant time points, and when 
parameters such as kinetic rates are unknown. In the context of dynamical systems 
our method can be applied across all simulation and modelling (including qualitative 
modelling) frameworks; for JAK-STAT signal transduction dynamics, for example, 
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we have been able to compare the relative explanatory power of ODE and time-delay 
differential equation models. Our model selection procedure is also not confined to 
dynamical systems; in fact the scope for application is immense and limited only by 
the availability of efficient simulation approaches. 

Routine application to complex models in systems, computational and population 
biology with hundreds or thousands of parameters [34] will require further numeri- 
cal developments due to the high computational cost of repeated simulations. SMC 
based ABC methods are, however, highly paralellizable and we believe that future 
work should exploit this property to make these methods computationally more ef- 
ficient. Further potential improvements might come from (i) regression adjustment 
techniques that have so far been applied in the parameter estimation ABC frame- 
work [4,35,36] (ii) from automatic generation of the tolerance schedules [37], and (hi) 
by developing more sophisticated perturbation kernels that exploit inherent prop- 
erties of biological dynamical systems such as sloppiness [38,39]; here especially we 
feel that there is substantial room for improvement as the likelihoods of dynamical 
systems contain information about the qualitative behaviour [40] which can also be 
exploited in ABC frameworks. 

5 Conclusion 

We conclude by emphasizing the need for inferential methods which can assess the 
relative performance and reliability of different models. The need for such reliable 
model selection procedures can hardly be overstated: with an increasing number of 
biomedical problems being studied using simulation approaches, there is an obvious 
and urgent need for statistically sound approaches that allow us to differentiate 
between different models. If parameters are known or the likelihood is available in 
a closed form, then the model selection is generally straightforward. However, for 
many of the most interesting systems biology (and generally, scientific) problems 
this is not the case and here ABC SMC can be employed. 
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Supplementary material A: Derivation of ABC SMC model 
selection algorithms 

We start this section by briefly reviewing the building blocks of the ABC SMC algorithm of Toni 
et al. [8], which is based on sequential importance sampling (SIS). The main idea of importance 
sampling is to sample from the desired target distribution n (which can be impossible or hard to 
sample from) indirectly through sampling from a proposal distribution r\ [41]. To get a sample from 
7r, one can instead sample from r) and weight the samples by importance weights 

T){x) 

In SIS one reaches the target distribution nr through a series of intermediate distributions, n t , 
t — 1, ... ,T — 1 [42, 43]. If it is hard to sample from these distributions one can use the idea 
of importance sampling described above to sample from a series of proposal distributions rjt and 
weight the obtained samples by importance weights 

wt(xt) = . . ■ 6) 
Vt(xt) 

In SIS the proposal distributions are defined as 

Vt{xt) — J r)t-i(x t -i)Kt(xt-i,xt)dx t -i, (7) 
where r)t-i is the previous proposal distribution and K t is a Markov kernel. 

To apply SIS, we need to define the intermediate and the proposal distributions. In an ABC 
framework [4,5], which is based on comparisons between simulated and experimental datasets, we 
define the intermediate distributions as [6, 8] 

n t (x) = ^-f^l(d(D ,D (b) (x))<e t ), (8) 

where P(x) denotes the prior distribution and -D(i), . . . , D(s t ) are B t > 1 data sets generated for 
a fixed parameter x, ~ f(D\x). l(x) is an indicator function and e t is the tolerance required 
from particles contributing to the intermediate distribution t. To simplify the notation we define 
bt(x) = ±Y:>± 1 Z{d(Do,D (b) (x))<e t ). 

We define the first proposal distribution to equal the prior distribution, rji(x) = P(x). The proposal 
distribution at time t (t = 2, . . . , T), rj t , is defined as 

r)t(xt) = t(P(x t ) > 0)l{b t (x t ) > Q) J ■K t _ 1 {x t - 1 )K t (x t \x t - 1 )dxt- 1 , (9) 

where K t denotes the perturbation kernel (e.g. random walk around the particle). For details of 
how this proposal distribution was obtained, see [8] . 

In the remainder of this section we introduce three different ways in which ABC SMC ideas pre- 
sented above can be used in the model selection framework. We start by proposing a simple and 
naive incorporation of the above building blocks for model selection. We then continue by deriving 
an ABC SMC model selection algorithm on the joint model and parameter space, which is presented 
in the methods section of the paper. In the end we present ABC SMC algorithm for approximation 
of the marginal likelihood, which can also be employed for model selection. 

The only of these three algorithms that we present in the main part of the paper and use in 
examples is algorithm II (ABC SMC model selection on the joint space), since the other two algo- 
rithms (I and III) are computationally too expensive and impractical to use. 
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I) ABC SMC m REJg model selection algorithm 



Very naively and stragihtforwardly the intermediate distributions can be defined as 



7Tt(m) = P(m)bmt(m), 



where 



bmt(m) : — 



E fl ~P( g | ro )l(ri(A>,-P(fl,m)) <e t ) 
£ fl ~™ M HPWm) > 0) 



This means that for each model m we calculate bmt(m) as the ratio between the number of accepted 
particles (where the distance falls below e t ) and all sampled particles, where parameters 9 of model 
m are sampled from the prior distribution P(9\m). 

If a set of candidate models M of a finite size \M\ is being considered, and N denotes the number 
of particles, then we can write the algorithm as follows: 

MSI Initialize ei, . . . , £t- 

Set the population indicator t = 1. 

MS2 For i = l,..., \M\, calculate the weights as 



MS3 Normalize the weights. 

If t < T, set t = t + 1, go to MS2. 

In this algorithm we estimate the posterior distribution of the model indicator m sequentially (i.e. 
using ideas from SIS), but the integration over model parameters is not sequential; we always sample 
them from the prior distribution P(6\m) (i.e. in the rejection sampling manner). This algorithm is 
therefore computationally very expensive. It would be computationally more efficient to generate 
0t by exploiting the knowledge about 6 that is contained in {6}t-i. In addition to learning m 
sequentially, i.e. by exploiting {m} t _i for generating m t , we would also like to learn 9 sequentially. 

In order to do this, we define 

II) ABC SMC model selection on the joint space 

Let (m, 9) denote a particle from a joint space, where m corresponds to the model indicator and 9 
are the parameters of model m. We define the intermediate distributions by 



In the following equations KM t denotes the perturbation kernel for the model parameter, KPt, m 
denotes the perturbation kernel for the parameters of model m, and t is the population number. 
Now we derive the sequential importance sampling weights 




7T t (m,0) = P{m,9)b t {m,9), 



where 




wt{m t ,9t) 



ittjmt, 9 t ) 
r) t (m t ,9t) ' 
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For a particle (m t ,0 t ) from population t, we define the proposal distribution rit{mt,O t ) as 

r, t (m t ,8 t ) = l(P(m t ,8 t )>0)l(b t (m t ,8 t )>0) (10) 
x / nt-i(mt-i)KMt(mt\mt-i)dm t -i 

Jm t _ 1 

x f irt-i{p t -i)KPt(Ot\e t -i)dBt-i 

J6 t -i |m t _i=m t 

oc l(P(mt,Ot)>0)l(bt(mt,O t )>0) 



\M\ 

X 

3 = 1 

(k) 



^P,-,(mii , ,)™,(»>.|n>, < i , i) 



- t e P,_ 1(m ;::^, ) ^^.c-ie\), 

where intermediate marginal model probabilities Pt(m) are defined as 

P t (mt = m) = ^ w t (m t ,6t). 

The weights for all accepted particles are (obtained by including (8) and (fO) in equation (6)) 
m(m t ,6 t ) = P(m t ,9 t )b t ( mt ,9 t ) _ 

E^J P t J M^t-l) KM t( m t\ m ?-l)T,k;rn t . 1 =r nt P^l^-^'l^) 

The resulting ABC SMC algorithm is presented in the methodology section of the main part of the 
paper. 

Ill) ABC SMC approximation of the marginal likelihood P(Do\m) 

If we can calculate the marginal likelihood P(Do\m) for each of the candidate models that we 
consider in the model selection problem, then we can calculate the marginal posterior distribution 
of a model m as 

- P(Do\m)P(m) 
[ l 0) Z m ,P{D \m')P(m>y (U) 
We now explain how to calculate P(Do\m) for model m. In the ABC rejection-based approach the 
posterior distribution of the parameters for each model m are estimated independently by employing 
ABC rejection; the marginal likelihood then equals the acceptance rate, 

_ , _ . . ^accepted particles given model m , . 

P(D \m) « , (12) 

i.e. the ratio between the number of accepted versus the number of proposed particles N m . We can 
use this marginal likelihood estimate to calculate P(m\D ) using equation (11). This approach has 
been used in [12]. 

We now derive how ABC SMC can be used for estimating the marginal likelihood, which can 
be then used for model selection. In a usual ABC SMC setting for drawing samples from the pos- 
terior parameter distribution P(0\m, Do) for a given model m, we define intermediate distributions 
as 

7T t (0) = P(0)l(d(£>o, D(9)) < e t ). (13) 
The target distribution tt t is an unnormalized approximation of the posterior distribution P(8\m, D ) 
We are now interested in its normalization constant, i.e. the marginal likelihood, 



P(D \m) w f n T (8)d8. 
Je 
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Let us call the integrals of n t (9), J e n t (9)d9, the intermediate marginal likelihoods. 

In the usual ABC SMC parameter estimation setting, our goal is to obtain samples from distribu- 
tion ttt{9), whereas our goal here is to obtain its normalization constant. While this distribution as 
defined in equation (f3) is in general unnormalized, the ABC SMC parameter estimation algorithm 
performs normalization of weights at every t and therefore returns its normalized version [8]. So 
we cannot use the usual output of ABC SMC directly. Instead we proceed as follows. 

We would like to draw particles from the following target distribution: 



where P{9) is the prior distribution. To draw samples from 7r we can use ABC SMC, where we 
define the intermediate distributions as 



In each population we accept N particles, and a particle is only rejected if it falls outside the 
boundaries of Tt. We classify the accepted particles in two sets, O] := {9; d(Do, D(9)) < et} and 
&t '■= {9; d(Do, D(9)) > e t }, depending on the distance reached. In each population t we can then 
calculate the intermediate marginal likelihoods by 



setting, where T = 1 and all weights are equal, this result corresponds to (12). 

After calculating P(Do\m) for each m, we can use equation (11) to calculate the marginal pos- 
terior distributions for model m, 



The model selection algorithm based on approximating the marginal likelihood proceeds as follows: 

Algorithm 

Ml For model mj, j = 1, . . . , \M\ do steps SI to S4. Then go to M2. 

SI Initialize ei, €t- 

Set the population indicator t = 1. 

52.0 Set the particle indicator i — 1. 

52.1 If t = 1, sample 9** independently from P(9). 

If t > 1, sample 9* from the previous population {# t '!_\} with weights w t -i and perturb the 
particle to obtain 9** ~ Kt(6\9*), where K t is a perturbation kernel. 
If P(9") = 0, return to S2.1. 

For a particle 9** simulate a candidate data set D and calculate d(Do, D{9**)). 

If d(D ,D(9**)) < e t , add 9" to Q\{rrij). If d(D ,D(9**)) > e t , add <9" to <£>1{mj). 

S2.1 Calculate the weight for particle 9^ — 9**: 



Tt{9) = P(9)l[d(D ,D(9)) < e T ]+P(9)l[d(D ,D(9)) > e T ], 



Tt{9) = P(9)l[d(D ,D(9)) < e t ] + P(9)l[d(D , D(9)) > e t ] 
= T?{0) + T?(e). 




P(m\D ) 




If i < N set i = i + 1, go to S2.1. 



20 



53 Normalize the weights. 

If t < T, set t = t + 1, go to S2.0. 

54 Calculate 

M2 For each mj calculate P(mj|Do) using equation 

P(Do|m)P(m) 



P(m\D ) = 



£ m , P(D \m>)P(m>) 



The computational advantage of this model selection algorithm compared to the marginal likelihood 
model selection based on ABC rejection can be obtained by (i) starting with a small number of 
particles N in population 1 and increasing it in each subsequent population. This way not much 
computational effort is spent on simulations in earlier populations, but we nevertheless have a big 
enough sample set in the last population to obtain a reliable estimate; (ii) exploiting the property 
that intermediate distributions in the parameter estimation framework should be included in one 
another, and so 

range Oj > range 0j + i, t = 1, . . . , T — 1. 

In other words, a proposed particle in population t cannot belong to Q] if it cannot be obtained 
by perturbing any of the particles in ©J_i- We can therefore reject some of the proposed particles 
without simulation. This means a huge saving in computational time, since simulations are the 
most expensive part of ABC based algortihms. However, one of the obvious ways to exploit this 
property would be to use a truncated perturbation kernel with ranges they cover being smaller than 
the range of prior distribution. But we find this unsatisfactory and, in the present form, feel that 
evaluating the marginal model likelihood directly is not practical. 



Supplementary material B: Tutorial on ABC rejection 
and ABC SMC for parameter estimation and model se- 
lection 

Available in arXiv (reference arXiv:0910.4472v2 [stat.CO]). 

Supplementary material C: Supplementary figures and 
datasets 

Available on the Bioinformatics webpagc. 
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